Skip to content

fix: route checkpoint operational logs to stderr#47

Open
Simon-Free wants to merge 6 commits intoSafeRL-Lab:mainfrom
Simon-Free:pr3-checkpoint-stderr-tokens
Open

fix: route checkpoint operational logs to stderr#47
Simon-Free wants to merge 6 commits intoSafeRL-Lab:mainfrom
Simon-Free:pr3-checkpoint-stderr-tokens

Conversation

@Simon-Free
Copy link
Copy Markdown

@Simon-Free Simon-Free commented Apr 17, 2026

Summary

Route the two [checkpoint] operational prints (large-file skip, backup-failure) to sys.stderr instead of stdout. On stdout they pollute the conversation transcript captured by session tooling; on stderr they stay in the operator's view where they belong.

Changes

File +/- What
checkpoint/store.py +3/-2 import sys + file=sys.stderr on the two print() calls in track_file_edit
tests/test_checkpoint_store.py +61 Three behavior tests via capsys: large file skipped (stderr, not stdout), normal file backed up, shutil.copy2 failure routed to stderr
tests/test_checkpoint_e2e.py +112 Two end-to-end scenarios driving agent.run with a mocked LLM that issues a Write tool_call; the Write hook, checkpoint store, and stderr routing all run for real

What was dropped

tests/test_checkpoint_extras.py is deleted. Three stale TestTokenSnapshotExtendedFields cases asserted cache_read / cache_creation fields removed in 620bbb2 per earlier reviewer feedback. The remaining two cases were trivial or file-source text scans; neither tests user behavior.

Scope limit

Only the stderr-routing change is in this PR. The earlier stderr_lines / cache_read / cache_creation snapshot fields mentioned in the previous description are not included: they need producer-side wiring (collecting per-tool stderr into the snapshot, tracking cache tokens from provider responses) which belongs in a follow-up once the plumbing exists.

Ref #43

@Simon-Free Simon-Free marked this pull request as draft April 17, 2026 19:44
@chauncygu chauncygu marked this pull request as ready for review April 17, 2026 22:20
@chauncygu
Copy link
Copy Markdown
Contributor

Thanks for the contribution! I'd like to ask for this PR to be split into two, because the two changes have very different risk profiles.

A few issues first:

  1. Description doesn't match the diff. The PR says it modifies session_store.py and adds optional stderr/tokens fields to add_entry(). The actual diff modifies checkpoint/store.py
    and does something else (redirect existing prints to stderr + add two new keys to token_snapshot). Please update the description so reviewers can follow along.
  2. cache_read / cache_creation are dead fields. The new code reads:
    "cache_read": getattr(state, "total_cache_read_tokens", 0),
    "cache_creation": getattr(state, "total_cache_creation_tokens", 0)
  3. The tests assert on source strings, not behavior. test_cache_read_in_source / test_cache_creation_in_source just grep the source file for the literal "cache_read" / "cache_creation". These don't verify anything is actually captured and will lock in the implementation without catching real regressions.

Proposed split:

  1. Keep the stderr change as its own small PR. Replacing print(...) with print(..., file=sys.stderr) for those [checkpoint] diagnostics is a clean, safe improvement — happy to merge, that quickly on its own.
  2. Drop the token-field change from this PR. To land it properly, we first need to actually track cache tokens on State — i.e., accumulate cache_read_input_tokens / cache_creation_input_tokens from the API response in providers/ (and wherever the running totals live), then persist them in the checkpoint. Otherwise we're adding schema that carries no information. Please open a separate PR for that work once the producer side is in place, with tests that assert real values flow through (not string matches on source).

Regarding the failing CI on 3.10/3.11 — that's a pre-existing infra issue (sqlalchemy isn't in requirements.txt but tests/test_web_api.py imports it). It's unrelated to your changes and is being fixed separately. Once that fix is on main, please click Update branch on this PR (or rebase) so the new ci.yml takes effect.

Thanks again!

@Simon-Free
Copy link
Copy Markdown
Author

Simon-Free commented Apr 18, 2026

I am very sorry for description not matching the diffs, I mistakenly made the PR on cheetahclaws instead of my fork, and thus marked as draft, in order to take the time to manually review, can I undraft them back and ping you whenever I finished to check it ? For both this one and the other

@chauncygu
Copy link
Copy Markdown
Contributor

Sure, Thanks a lot for your great contributions.

Simon-Free pushed a commit to Simon-Free/cheetahclaws that referenced this pull request Apr 20, 2026
The three TestTokenSnapshotExtendedFields cases asserted cache_read /
cache_creation fields that were removed in 620bbb2 ("fix: remove dead
cache_read/cache_creation fields per review"). They have been failing
ever since. Delete test_checkpoint_extras.py -- its remaining cases were
either trivial (test_store_imports_sys checks 'import sys' exists) or
file-source text scans (TestCheckpointPrintsToStderr) which don't test
user behavior.

Add tests/test_checkpoint_e2e.py with two real e2e scenarios:
- Drive agent.run with a mocked LLM that emits a Write tool_call; assert
  the checkpoint hook created a pre-edit backup of the original content.
- Same path but the file exceeds _MAX_FILE_SIZE -- assert the skip
  message lands on stderr only, not stdout. This is the actual
  user-visible contract of PR SafeRL-Lab#47 and covers the full wiring
  agent.run -> Write hook -> checkpoint.store.track_file_edit.

The three behavior tests in test_checkpoint_store.py stay -- they cover
the store function directly via capsys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Simon-Free Simon-Free changed the title feat: add stderr capture and token fields to session checkpoints fix: route checkpoint operational logs to stderr Apr 20, 2026
bot and others added 4 commits April 21, 2026 11:03
The three TestTokenSnapshotExtendedFields cases asserted cache_read /
cache_creation fields that were removed in 620bbb2 ("fix: remove dead
cache_read/cache_creation fields per review"). They have been failing
ever since. Delete test_checkpoint_extras.py -- its remaining cases were
either trivial (test_store_imports_sys checks 'import sys' exists) or
file-source text scans (TestCheckpointPrintsToStderr) which don't test
user behavior.

Add tests/test_checkpoint_e2e.py with two real e2e scenarios:
- Drive agent.run with a mocked LLM that emits a Write tool_call; assert
  the checkpoint hook created a pre-edit backup of the original content.
- Same path but the file exceeds _MAX_FILE_SIZE -- assert the skip
  message lands on stderr only, not stdout. This is the actual
  user-visible contract of PR SafeRL-Lab#47 and covers the full wiring
  agent.run -> Write hook -> checkpoint.store.track_file_edit.

The three behavior tests in test_checkpoint_store.py stay -- they cover
the store function directly via capsys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Simon-Free Simon-Free force-pushed the pr3-checkpoint-stderr-tokens branch from b9bf441 to 4bc4a09 Compare April 21, 2026 09:06
@Simon-Free
Copy link
Copy Markdown
Author

Hey @chauncygu this one looks better to me, but if you have any improvement ideas feel free to share ! Also, i tried to introduce some "e2e" tests (quotes, because we are still mocking LLM + some other things) that simulate the workflow and showcase the fix/feature in action !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants